Flux-based classification of reactions reveals a functional bow-tie organization of 

complex metabolic networks 
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Unraveling the structure of complex biological networks and relating it to their functional role is 
an important task in systems biology. Here we attempt to characterize the functional organization 
of the large-scale metabolic networks of three microorganisms. We apply flux balance analysis to 
study the optimal growth states of these organisms in different environments. By investigating the 
differential usage of reactions across flux patterns for different environments, we observe a striking 
bimodal distribution in the activity of reactions. Motivated by this, we propose a simple algorithm 
to decompose the metabolic network into three sub-networks. It turns out that our reaction classifier 
which is blind to the biochemical role of pathways leads to three functionally relevant sub-networks 
that correspond to input, output and intermediate parts of the metabolic network with distinct 
structural characteristics. Our decomposition method unveils a functional bow-tie organization 
of metabolic networks that is different from the bow-tie structure determined by graph-theoretic 
methods that do not incorporate functionality. 



PACS numbers: 82.39.Rt 87.18.Vf87.18.-h 



I. INTRODUCTION 

Biological systems provide many examples of the intri- 
cate relationship between the structure and functional- 
ity of complex networks [H-Q- Cellular metabolism is a 
complex biochemical network of several hundred metabo- 
lites that are processed and interconverted by enzyme- 
catalyzed reactions [8H131]. Metabolic networks have a 
dynamic flexibility that enables organisms to survive un- 
der diverse environmental conditions. A key goal of 
systems biology is to unveil the functional organization 
of metabolic networks explaining their system-level re- 
sponse to different environments. To this end, we have 
attempted to decompose metabolic networks into func- 
tionally relevant sub- networks. Flux balance analysis 
(FBA) has been widely used to harness the knowledge of 
large-scale metabolic networks and investigate genotype- 
phenotype relationships p^ - [l6j . FBA has been success- 
ful in predicting the growth and deletion phenotypes of 
organisms [17H19I ] . Reaction fluxes carry information 
about the flows on metabolic networks and, as such, de- 
scribe the functional use of the network by the organism. 
In this paper, we have used this information to decom- 
pose the network into functionally relevant sub-networks. 

The paper is organized as follows: In section II we 
describe the modelling framework in which we study 
metabolic networks. In section III we discuss the clas- 
sification of active reactions in metabolic networks into 



three categories by an algorithm that is blind to their 
biochemical roles. Section IV shows that the three cate- 
gories are functionally relevant for the organism. In sec- 
tion V wc compare the bow-tie architecture obtained by 
our functional classification of reactions with that ob- 
tained by graph-theoretic methods that do not employ 
functional information. In the last section we conclude 
with a summary. 



II. THE MODELLING FRAMEWORK 

A. Flux balance analysis (FBA) 

Flux balance analysis (FBA) is a computational ap- 
proach widely used to analyze the capabilities of genome- 
scale metabolic networks [l3-[l6| . The stoichiometric ma- 
trix S encapsulates the stoichiometric coefficients of dif- 
ferent metabolites involved in various reactions of the 
metabolic network. FBA primarily uses structural infor- 
mation of the metabolic network contained in this ma- 
trix S to predict the possible steady state flux distribu- 
tion of all reactions and the maximum growth rate of an 
organism. In any metabolic steady state, the metabo- 
lites achieve a dynamic mass balance wherein the vector 
v of fluxes through the reactions satisfies the following 
equation representing the stoichiometric and mass bal- 
ance constraints: 



S.v = 0. 



(1) 
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Equation[T]is an under-determined linear system of equa- 
tions relating various reaction fluxes in genome-scale 
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TABLE I. Comparison of the three metabolic networks: E. coli, S. cerevisiae and S. aureus. 



Property 


E. coll 


a „ „ „■ 

d. cerevisiae 


o 

a. aureus 


Number of metabolites 


mi 


1 ACM 


a ao 

b4s 


Number of reactions in the model 


931 


1149 


641 


Number of one-sided reactions in the equivalent network 


1167 


1576 


863 


Number of external metabolites 


143 


116 


84 


Number of organic external metabolites (carbon sources) 


131 


107 


68 


Number of biomass metabolites 


49 


A O 

42 


ob 


Number of feasible minimal environments 


89 


A 9 


I I 


Number of active reactions 


585 


482 


418 


Number of reactions in category I 


185 


89 


84 


Number of reactions in category Ila 


147 


117 


194 


Number of reactions in category lib 


42 


46 


28 


Number of reactions in category III 


211 


230 


112 



metabolic networks leading to a large solution space of 
allowable fluxes. The space of allowable solutions can 
be reduced by incorporating thermodynamic and enzyme 
capacity constraints. To obtain a particular solution, lin- 
ear programming is used to find a set of flux values - a 
particular flux vector v - that maximizes a biologically 
relevant linear objective function Z . The linear program- 
ming formulation of the FBA problem can be written as: 

max Z = max {c T v|S.v = 0, a < v < b}, (2) 

where vectors a and b contain the lower and upper 
bounds of different fluxes in v and the vector c corre- 
sponds to the coefficients of the objective function Z. 
In FBA, the objective function Z is usually taken to be 
the growth rate of the organism. The environment, or 
medium, is defined in this approach by the components of 
a and b corresponding to the transport reactions, which 
determine, in particular, the set of metabolites whose 
uptake is allowed. 



B. Large-scale metabolic networks 

In this work, we have analyzed the large-scale 
metabolic networks of three microorganisms: Escherichia 
coli (version LIR904 f20j). Saccharomyces cerevisiae (ver- 
sion iND750 (2l| ) and Staphylococcus aureus (version 
iSB619 [22j]). Table U gives the number of metabolites 
and reactions in the metabolic networks of these three 
organisms. The metabolic networks contain internal and 
transport reactions. Internal reactions occur within the 
cell boundary. Transport reactions represent processes 
involving import or export of metabolites across the cell 
boundary. Each model also contains a pscudo biomass 
reaction that simulates the drain of various biomass pre- 
cursor metabolites for growth in the specific organism. 
Starting from the published metabolic network, we ob- 
tain an equivalent reaction network as follows: Every re- 
versible reaction in the network is converted into two one- 
sided (irreversible) reactions so that all reaction fluxes in 



the equivalent system are positive. A few reactions ap- 
pear in duplicate in these networks, and only a single 
copy of each reaction is kept in the equivalent network. 
The equivalent metabolic network is a reaction set con- 
sisting of N unique one-sided reactions where N is 1167, 
1576 and 863 for E. coli, S. cerevisiae and S. aureus, 
respectively (cf. Table HJ). 



C. Feasible minimal environments and associated 
flux vectors 

In this work, we have considered minimal environments 
each characterized by the presence of a limited amount 
of one organic carbon source (external nutrient metabo- 
lite) along with unlimited amounts of the following in- 
organic metabolites: ammonia, iron, potassium, pro- 
tons, pyrophosphate, sodium, sulfate, water and oxygen. 
The number of environments we consider for each organ- 
ism thus coincides with the number of organic external 
metabolites (carbon sources) in its metabolic network (cf. 
Table fl} . We used FBA to determine the set of minimal 
environments supporting growth in the metabolic net- 
works of E. coli, S. cerevisiae and S. aureus. A minimal 
environment is termed as feasible if the growth rate pre- 
dicted by FBA is nonzero for that carbon source. The 
number M of feasible minimal environments in E. coli, S. 
cerevisiae and S. aureus was obtained to be 89, 43 and 27, 
respectively (cf. Table H]) [23|. For each organism, and for 
each feasible minimal environment for that organism, we 
obtained an A^-dimensional optimal flux vector v maxi- 
mizing growth in the metabolic network of the organism, 
and whose component Vj gives the flux of reaction j. For 
every organism, this led to a set of M flux vectors cor- 
responding to M feasible minimal environments which 
were stored in the form of a matrix ~V=(v° ! ) of dimensions 
NxM where the rows (j=l,2,. . .,N) correspond to dif- 
ferent reactions in network and columns (a=l,2,. . -,M) 
to different feasible minimal environments, v" is defined 
as the flux of reaction j in the optimal flux vector v ob- 
tained for environment a. 
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D. Active reactions 

A given reaction j is termed as active in an environ- 
ment a if v">0. The activity m of a reaction denotes 
the number of minimal environments in which the reac- 
tion is active. The activity m for a reaction ranges from 
to M with M equal to 89, 43 and 27 for E. coli, S. 
cerevisiae and S. aureus, respectively. A reaction j is 
termed as active in an organism if m>l (i.e., if it is ac- 
tive in at least one feasible minimal environment for that 
organism). The number of active reactions in E. coli, 
S. cerevisiae and S. aureus was obtained to be 585, 482 
and 418, respectively (cf. Table Q}. This paper primarily 
focuses on decomposing this set of active reactions into 
functionally relevant sub-networks. 



III. CLASSIFICATION OF ACTIVE 
REACTIONS 

We ask the question: How does the activity of a re- 
action vary across different environments? To address 
this question, we determine the frequency distribution of 
the activity of reactions in an organism. Fig. 1 shows 
the histogram of the activity of reactions in the E. coli 
metabolic network. The distribution is bimodal. Most 
reactions in E. coli are either once- active (m=l) or al- 
ways active (m=89); the number of reactions for any 
given intermediate activity m in the range l<m<89 is 
small. Thus, the largest number of active reactions in the 
metabolic network are used in either one environment or 
in all environments. The histograms of activity of reac- 
tions in S. cerevisiae and S. aureus also have a pattern 
similar to that in E. coli (cf. Fig. 1). The frequency dis- 
tribution of activity of reactions in the three organisms 
suggests a natural classification of active reactions into 
three categories: 

(a) Category I reactions or once-active reactions 
(m=l) 

(b) Category II reactions or always active reactions 
(m=M) 

(c) Category III reactions with intermediate activity 
(Km<M) 



A. Sub-classification based on correlation of 
reaction fluxes 

Clustering of gene expression data using the correla- 
tion coefficient has been successful in predicting regula- 
tory modules associated with a biological function across 
diverse conditions [24j |. We used the correlation coeffi- 
cient to identify sets of reactions whose fluxes are corre- 
lated across different environments. We used the set of 
M flux vectors corresponding to M feasible minimal en- 
vironments contained in matrix V = (w") to obtain the 



matrix C = (Cjk) where Cjk is the correlation coefficient 
between two active reactions j and k and is given by: 



C 



M 



*>j<Pk 



(3) 



where 4>j = 



\ 



1 M 

a = l 



If Cjk = 1 then reactions j and k are perfectly corre- 
lated with each other in the given set of environments. 
Perfect clusters in metabolic networks are maximal sets 
of reactions that are perfectly correlated to each other 
pair wise. Perfect clusters are similar to enzyme subsets 
[25L [26j . correlated reaction sets [13, HH or fully cou- 
pled sets [29[ which have been used to detect modules in 
metabolic networks. 

We use Eq. [3] to identify perfect clusters in metabolic 
networks of E. coli, S. cerevisiae and S. aureus. In par- 
ticular, a large perfect cluster of 147 reactions was found 
in E. coli that is a subset of category II reactions. We re- 
fer to this subset of perfectly correlated reactions within 
category II as category Ha reactions. The remaining 42 
category II reactions that are always active but not per- 
fectly clustered with category Ha reactions are part of 
category lib. Similarly, large prefect clusters of sizes 117 
and 194 were found in category II reactions of S. cere- 
visiae and S. aureus, respectively. In Fig. 1, category Ha 
and lib reactions are shown in pink and blue colours, re- 
spectively. We have shown elsewhere that perfect clusters 
are metabolic modules that can be explained by studying 
the connectivity of their constituent metabolites [23[ . 

Note that we have used a single optimal flux vector v 
obtained using FBA for each of the M feasible minimal 
environments to determine the activity of a reaction and 
the set of active reactions in the metabolic network of an 
organism. However, it is well known that there exist mul- 
tiple flux vectors or alternate optimal solutions in most 
large-scale metabolic networks that maximize growth in 
a given environment [H, [30l - l32l | . In principle, due to the 
presence of alternate optima, the set of active reactions 
can change depending on the choice of the flux vectors. 
In Appendix A, we show the robustness of our reaction 
categories to the presence of alternate optima. 



B. Scatter plot of standard deviation versus mean 
flux of reactions across environments discriminates 
the three categories 

For each active reaction, following Almaas et al [33| . 
we have calculated the mean flux (v) and the standard 
deviation a around this mean by averaging the flux of the 
reaction over M feasible minimal environments. Fig. 2 
shows the scatter plot of a versus (v) for active reactions 
in E. coli. From Fig. 2, we can distinguish between 
categories I, II and III, respectively, as they show up quite 
distinctly (upper line, category I; lower line, category Ha; 
with category lib and category III largely in between the 
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FIG. 1. The histogram of activity of reactions in the E. coli metabolic network. The bars show the number of 
reactions that have an activity m where m ranges from 1 to 89 feasible minimal environments in the E. coli metabolic network. 
The green bar represents 185 category I reactions which are once-active. The pink bar represents 147 category Ha reactions (a 
subset of 189 always active category II reactions) that have fluxes perfectly correlated across environments. The deep blue bar 
represents 47 category lib reactions that account for the remaining category II reactions. The light blue bars account for 211 
category III reactions with intermediate activity. Insets: Histograms of activity of reactions in S. cerevisiae and S. aureus. 
The three categories of reactions in S. cerevisiae and S. aureus were defined in a manner similar to E. coli. 



two lines). The upper line in Fig. 2 is the expected curve 
a = (M — l) 1 / 2 (u) for category I reactions and the lower 
line is the curve a = b(v) , where b is obtained via best fit 
of data for category Ha reactions. Appendix B gives the 
derivation of the relation between a and (v) for category I 
and Ha reactions. Thus, we find that the three categories 
of reactions are distinct from each other by virtue of their 
statistical properties. 



IV. FUNCTIONAL RELEVANCE OF THE 
THREE CATEGORIES OF REACTIONS 

Until now our classification of active reactions into the 
three categories was solely motivated by the activity of 
reactions in E. coli, S. cerevisiae and S. aureus with two 
very prominent peaks for once-active and always active 
reactions (cf. Fig. 1). However, we now show that our 
three categories I, II, and III obtained using a computa- 
tional algorithm blind to the biochemical nature of path- 
ways corresponds to the input, output and intermediate 
sub-networks, respectively. Thus, each category of reac- 
tions is a sub-network with a distinct functional role in 
metabolism. 



A. Category I: Fan-in of input pathways 

Fig. 3 shows the sub-network of all 185 category I 
reactions in E. coli. The figure shows a number of essen- 
tially linear paths of one to about five reactions starting 
from an external nutrient metabolite, often converging to 



some other metabolite. These are the input pathways of 
those metabolites, typically starting from their transport 
reaction that brings them into the cell, and subsequent 
catabolic reactions that break them down into a smaller 
set of metabolites. Input pathways of 86 out of the 89 ex- 
ternal nutrient metabolites (carbon sources) characteriz- 
ing different feasible minimal environments are contained 
in category I, thereby implying that category I essentially 
covers all the input pathways of metabolism. Similarly, 
we find that category I reactions in S. cerevisiae and S. 
aureus contain input pathways for most external nutri- 
ent metabolites characterizing different feasible minimal 
environments. Thus, category I essentially corresponds 
to input part of the metabolic network. 



Fig. 4 shows a portion of category I reactions belong- 
ing to sugar input pathways in E. coli where several ex- 
ternal sugar metabolites converge downstream into a few 
intermediate metabolites. Thus, the input pathways in 
category I exhibit the fan-in property whereby diverse 
external nutrient metabolites are first catabolized into 
a smaller set of intermediate metabolites before being 
drawn into the interior of the metabolic network. Usually 
the external nutrients whose input pathways converge to 
a common metabolite belong to the same biochemical 
class (cf. Figures 3 and 4). Fig. 3 contains a num- 
ber of disconnected subgraphs each describing the input 
pathways of one or more biochemically similar metabo- 
lites; these disconnected paths get connected to the larger 
metabolic network via further downstream reactions that 
belong to other categories and are not shown in Fig. 3. 
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FIG. 2. Standard deviation versus mean flux of active reactions in the E. coli metabolic network. The plot shows 
standard deviation a versus mean flux (v) of the 585 active reactions in E. coli metabolic network across M — 89 feasible 
minimal environments on a logarithmic scale. The green, pink, dark blue and cyan dots represent category I, Ha, lib and III 
reactions, respectively. The three categories of reactions show up quite distinctly (upper line, category I; lower line, category 
lla; with category lib and category III in between the two lines). The upper line is the expected curve a = (M — 1) 1//2 (ij) 
for category I reactions. The lower line is the expected curve o = b(v) for perfectly correlated category lla reactions with 
b = 0.98 ±0.1 obtained via best fit to the data. Insets: Scatter plots of o versus (v) of active reactions in S. cerevisiae and S. 
aureus metabolic networks. 




FIG. 3. Category I reactions in E. coli. This figure shows the bipartite graph of 185 category I reactions in E. coli. 
Rectangles represent reactions and ovals metabolites. External nutrient metabolites (organic carbon sources) are depicted in 
green and biomass metabolites in pink. For convenience, we have chosen to omit the high degree currency metabolites (such 
as ATP) from the figure in order to reduce clutter and focus on the biochemically relevant transformation in each reaction. 
Abbreviation of metabolites and reactions are as in UR904 model [23| . The figure was drawn using Graphviz software [34[ • The 
high resolution electronic version of this figure can be zoomed in to read node labels and biochemical categories of boxes. We 
have classified the external metabolites and grouped together their input pathways in boxes based on biochemical similarity. 
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FIG. 4. A small portion of category I sub-network in E. coli showing sugar input pathways. The figure shows 
category I reactions in the input pathways for external nutrient metabolites classified into the biochemical category 'Sugars'. 
Two kinds of sugars are shown here: monosaccharides and disaccharides. The input pathways for 7 external sugar metabolites 
fan-in downstream into 3 monosaccharide metabolites which occur at the boundary between category I and III sub-networks. 
Conventions are the same as in Figure 3. 




FIG. 5. Category Ila reactions in E. coli. This figure shows the graph of 147 category Ha reactions in E. coli whose reaction 
fluxes are perfectly correlated across minimal environments. Conventions are the same as in Figure 3. The preponderance of 
biomass metabolites (pink ovals) in this figure signifies that these reactions are at the output end of the metabolic network. 
The reactions have been grouped together into boxes based on common biosynthetic pathways. 



B. Category II: Output biosynthetic pathways 



A key biological function of the metabolic network 
is to convert nutrient metabolites in the environment 
into biomass metabolites required for growth and main- 
tenance of the cell. The biomass metabolites, which in- 
clude all the amino acids, nucleotides, lipids and certain 
cofactors, may be considered to be the output of the 
metabolic network. Category II reactions are always- 



active and have a nonzero flux for any feasible minimal 
environment. We found that the category II sub-network 
has biosynthetic pathways for 30 out of the 49 biomass 
metabolites in E. coli. These pathways are typically the 
sole production pathways of those biomass metabolites 
in E. coli f23~j . Thus, this sub- network is at the output 
end of the metabolism. 

Of the 189 category II reactions in E. coli, 147 reactions 
belong to category Ila, whose fluxes are perfectly corre- 
lated across the different minimal environments. Fig. 5 
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FIG. 6. Category III reactions in E. coli. This figure shows the network of reactions that are active in two or more minimal 
environments considered, but not in all the environments. Conventions are the same as in Figure 3. Comparing this graph of 
category III reactions with category I and Ha reactions (cf. Figures 3 and 5), it is evident that category III sub-network has a 
highly reticulate structure with many loops. 
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shows the graph of the category Ha sub-network in E. 
coli, which is the single largest perfect cluster of reac- 
tions. The remaining 42 reactions in category II consti- 
tute the category lib, which are always active but not 
perfectly correlated with category Ha reactions and with 
each other. Thus, the fluxes of category lib reactions 
vary in a more complicated manner across minimal en- 
vironments. Categories Ha and lib exist with similar 
properties in the metabolic networks of the other two 
organisms (cf. Table [I]). In our previous work, we have 
shown that most of the category II reactions are essen- 
tial for growth irrespective of the environment [23j . The 
set of category II reactions is a superset of reactions in 
the activity core found earlier by Almaas et al [HI which 
are reactions always used across minimal as well as rich 
environments. 



C. Category III: Intermediate pathways between 
input and output 

Fig. 6 shows the sub-network of category III reactions 
in E. coli, which are neither once-active nor always active; 
the activity of these reactions depends on the availability 
of nutrients in a more complicated manner. Category III 
reactions may be considered to constitute the intermedi- 
ate part of the network. By comparing the structures of 
the three categories, it is evident that category III has a 
highly reticulate and complex architecture compared to 
category I and II. There is a functional reason for the 
observed complexity in category III sub-network. The 
biomass metabolites collectively contain several different 
types of chemical structures (moieties), and the E. coli 
metabolic network is capable of producing these biomass 
metabolites from different minimal environments, each 
containing a different (and single) carbon source. A typ- 
ical external carbon source has one or a few moieties with 
different nutrients containing different subsets of moi- 
eties. Category I reactions transport the carbon sources 
into the cell and break it down into a small set of moi- 
eties. The function of category III reactions is to start 
with a small set of moieties and produce all the moieties 
required for biomass production. This requires a com- 
plex set of internal transformations and the exact set of 
transformations required depends on the nature of the 
input moieties. Thus, the activity of category III trans- 
forming reactions depends upon the biochemical nature 
of available nutrients in different minimal environments. 
We find that category III contains most of the reactions in 
central metabolism such as the citric acid cycle. A sim- 
ilar architecture of category III sub-network was found 
in the metabolic networks of the other two organisms as 
well. Some of the biomass metabolites are produced in 
category III itself. For the other biomass metabolites cat- 
egory III produces precursors which are then taken up in 
the biosynthetic pathways of category II to produce the 
biomass metabolites. 



V. COMPARISON OF FUNCTIONAL BOW-TIE 
DECOMPOSITION WITH GRAPH-THEORETIC 
BOW-TIE DECOMPOSITION 

Ma and Zeng [ll|, |36| have used graph-theoretic mea- 
sures to reveal a bow-tie architecture of metabolic net- 
works similar to that seen in World Wide Web (WWW) 
[37}, wherein the network can be decomposed into an 
in-component, out-component and a giant strong com- 
ponent. Given a directed graph, a strong component is 
a maximal subgraph such that for any pair of nodes i 
and j in the set there exists a directed path from i to 
j and from j to i within the subgraph. In general, a 
directed graph can have many strong components, and 
the strong component with the largest number of nodes 
is designated as the giant strong component (GSC). The 
associated in-component consists of nodes which have ac- 
cess to GSC nodes via some directed path, but cannot be 
reached from any GSC node via a directed path. The out- 
component consists of nodes which can be reached from 
the GSC nodes via some directed path, but lack access 
to any GSC node via a directed path. 

In this work, we have decomposed the metabolic net- 
work into three categories using a simple algorithm based 
on activity patterns of reactions across different minimal 
environments. Our categorization reveals a functional 
bow-tie architecture wherein the input pathways (cate- 
gory I reactions) fan into intermediate metabolism (cate- 
gory III reactions) which forms the knot of a bow-tie and 
from where the output pathways (category II reactions) 
for various biomass components fan out. 

In our functional bow-tie decomposition, the three cat- 
egories I, II and III of reactions discussed above broadly 
correspond to the in-component, out-component and 
GSC, respectively, of the graph-theoretic bow-tie decom- 
position by Ma and Zeng [ll|, [36|. However, the cor- 
responding sets of reactions in the two decompositions 
differ in detail. For example, we find that the end prod- 
ucts of several (and often long) chains of reactions in 
the category II sub-network are re-cycled resulting in 
feedback loops. Such feedback loops in category II sub- 
network presumably minimize wastage and could be in- 
strumental in producing the biomass metabolites in the 
desired ratios. An example of such a feedback loop in 
category II sub-network is the one involving metabolite 
5mdrlp (which can be seen in the electronic version of 
Fig. 5 upon zooming). The biosynthetic pathways in- 
volved in such feedback loops appropriately belong to the 
output part of metabolism because they connect the pre- 
cursor metabolites to the outputs. However, the graph- 
theoretic bow-tie decomposition would classify such cat- 
egory II reactions in feedback loops into the GSC. Thus, 
our functional bow-tie decomposition based on fluxes of 
reactions across different environments gives better a in- 
sight and is biochemically more realistic. The picture of 
the metabolic network our decomposition reveals is sim- 
ilar in spirit to the one envisioned by Csete and Doyle 
[l2^ . Further, it is important to note that our flux-based 
categorization does not involve the a priori exclusion of 
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high degree currency metabolites as was needed in the 
graph-theoretic bow-tie decomposition of the metabolic 
network [UIH]. 



VI. CONCLUSIONS 

In this paper, we have performed flux balance analy- 
sis (FBA) for the metabolic networks of three microor- 
ganisms: E. coli, S. cerevisiae and S. aureus to obtain 
fluxes of reactions in the network under diverse envi- 
ronmental conditions. We have followed a purely algo- 
rithmic approach leveraging on the predicted fluxes of 
reactions across different minimal environments to de- 
compose the metabolic network into functionally relevant 
sub- networks. We find that the activity of a reaction 
given by the number of minimal environments for which 
it has a nonzero flux is an important indicator of the func- 
tional role of a reaction. We have classified the reactions 
into three functional categories based on their activity. 
Category I contains once-active reactions which are used 
in only one minimal environment. Most reactions belong- 
ing to the category I sub-network are uptake pathways 
for external nutrients in feasible minimal environments, 
and the primary function of these reactions is to catab- 
olize external nutrients into simpler metabolites which 
can be further processed by intermediary metabolism. 
Category II contains always active reactions which are 
used in all minimal environments. The category II sub- 
network is critical for the survival of the organism and 
accounts for the majority of the biosynthctic pathways 
for the production of the biomass metabolites at the out- 
put end of metabolic network. Category III contains re- 
actions which are used in an intermediate number of min- 
imal environments, and is responsible for generating the 
'precursor' molecules that are eventually converted into 
biomass metabolites by Category II reactions. We find 
that while category I and II sub-networks arc dominated 
by simple linear pathways, the structure of the category 
III sub-network is highly reticulate. In summary, our de- 
composition method for large-scale metabolic networks 
based on activity of reactions captures the proposed func- 
tional bow-tie organization by Csete and Doyle: the in- 
put pathways (category I reactions) for nutrients in the 
environment fan into intermediate metabolism (category 
III reactions) which forms the knot of bow-tie from where 
the output biosynthetic pathways (category II reactions) 
for biomass components fan out. Our results are valid 
for metabolic networks of three phylogenetically differ- 
ent organisms (two distinct prokaryotes and a eukaryote) , 
which suggests that the observed functional bow-tie or- 
ganization could be quite common in living systems. 



Appendix A: Robustness of categorization of 
reactions to alternate optimal solutions 

In this work, flux balance analysis (FBA) was used to 
obtain a particular flux vector v or optimal solution that 



maximizes the objective function taken as the growth 
rate in a given minimal environment. However, for large- 
scale metabolic networks, there exist multiple flux vectors 
v or alternate optimal solutions that maximize growth 
in a given minimal environment, i.e., there are many flux 
vectors v with exactly the same value of the objective 
function but use different alternate pathways in the net- 
work [2^, |30| - [32| . FBA finds one of many possible alter- 
nate optima for a given minimal environment that max- 
imizes growth. In the main text, we have used a single 
optimal flux vector v for each of the M feasible minimal 
environments to determine the activity of a reaction and 
the set of active reactions in the metabolic network of an 
organism. Since, in principle, the activity of a reaction 
can change depending on the particular flux vector con- 
sidered, we study the robustness of our categorization of 
reactions to the presence of alternate optima. 

Flux variability analysis (FVA) [3l[ can be used to de- 
termine the set of reactions whose fluxes vary across al- 
ternate optima for a given minimal environment. Specif- 
ically, FVA determines the maximum and minimum flux 
value that each reaction can take across alternate optima 
for a given minimal environment. FVA involves the fol- 
lowing steps: 

(a) Determine using FBA the maximum value of the 
objective function Z or growth rate v£ iomass in a 
given minimal environment a. 

(b) Fix the flux of the biomass reaction equal to 

biomass ' 

(c) Change the objective function Z to be the flux of 
a reaction j . 

(d) Using linear programming determine the maximum 
flux value v" max of reaction j in the minimal en- 
vironment a, constraining the biomass reaction to 
have a flux equal to < omass . 

(e) Using linear programming determine the minimum 
flux value v" min of reaction j in the minimal en- 
vironment a, constraining the biomass reaction to 
have a flux equal to < omass . 

(f) The range v^ min to v^ max gives the variability of 
flux of reaction j across different alternate optima. 

(g) The above steps c, d, e and f can be repeated for 
every reaction j in the metabolic network to de- 
termine the flux variability of each reaction across 
alternate optima for a given minimal environment 
a. 

We have used FVA to determine v" max and v" min for 
each reaction j and for each feasible minimal environ- 
ment a in the E. coli metabolic network. A reaction 
j is designated as blocked if „ =0 for all M feasible 
minimal environments [29J, [38| . We found 329 blocked re- 
actions in the E. coli metabolic network. The remaining 
838 reactions, for which v" max >0 for at least some envi- 
ronment a are designated as potentially active reactions. 
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This set includes the 585 active reactions considered in 
the main text. We define a reaction j as essential for 
a given minimal environment a if v" min >0. 484 reac- 
tions were found to be essential for some a in the E. coli 
metabolic network which are a subset of the 585 active 
reactions considered in the main text. We now classify 
these 484 reactions into the following three categories: 

(a) Essential category I: Reactions which satisfy 
w" min >0 for exactly one minimal environment. Wc 
found 162 reactions in the E. coli metabolic net- 
work to be in Essential category I. Of these, 153 
reactions belong to category I of the main text. 

(b) Essential category II: Reactions which satisfy 
v" tTnin >0 for all M minimal environments. Wc 
found 171 reactions in the E. coli metabolic net- 
work to be in Essential category II. All of these 
belong to category II of the main text. 



(c) Essential category III: Reactions which satisfy 
v" min >0 for m minimal environments where 
l<m<M. We found 151 reactions in the E. coli 
metabolic network to be in Essential category III. 
Of these, 145 belong to category III of the main 
text. 

Thus we find that the classification discussed in the main 
text which uses a particular flux vector correctly predicts 
the essential category I, II or III of 469 out of the 484 
essential reactions. 



nonzero value and in all other feasible minimal environ- 
ments the flux of reaction j is 0. 

Thus, the standard deviation Oj for a category I reac- 
tion j is given by: 



\ a=l 



_[(M-l)(^ + (^-(^.})2] 



= y/M=l(vj), (B2) 
where we have used the result in Eq. IB1I 



2. Category Ila reactions 

For two reactions j and k in category Ila, their flux 
Vj 1 and in a given environment a are perfectly corre- 
lated to each other. The fluxes of category Ila reactions 
are proportional to each other having the same propor- 
tionality constant for all minimal environments. For a 
minimal environment a, we can write the flux of cate- 
gory Ila reaction j as: 



(B3) 



where c a is a constant for the minimal environment a 
and v® is some number. For any two reactions j and 
k in category Ila with fluxes correlated across minimal 
environments, we have: 



Appendix B: Relation between standard deviation a 
and mean flux (v) for category I and category Ila 
reactions 

In Fig. 2, we plot the standard deviation a versus 
mean flux (v) for active reactions in a metabolic network 
across its M feasible minimal environments. Here, we 
derive the relation between mean flux (v) and standard 
deviation a for reactions in category I and category Ila 
shown as upper and lower lines, respectively, in Fig. 2. 



1. Category I reactions 

In a given organism any reaction belonging to category 
I has activity m=l, and is active for a single environment 
(say ao). The mean flux (vj) of a category I reaction j 
across M feasible environments is given by: 

M 



-Y< 

M ^ 3 



a=l 



3 

M ' 



(Bl) 



where v" is the flux of reaction j in the environment a 
(a = 1, 2, ... , M). v"° is the flux of reaction j in the only 
feasible minimal environment ao where the reaction has 





c a v 












vf 







(B4) 



where a and a' are two different feasible minimal envi- 
ronments for the organism. 
The mean flux of reaction j is: 

M 



— Yv? 

M ^ 3 

a=l 
1 M 

ViT7Y c ° 
3 M ^ 

a=l 



(B5) 



where (c) is the mean of c a across the set of feasible 
minimal environments. 

The standard deviation Uj for category Ila reaction j 
is given by: 



\ 



M 



\ 



1 m 
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v®a c where we have used the result in Eq. 

a c {vj) 



[c) 

b( Vj ), (B6) 
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